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Wyner-Ziv Coding over Broadcast Channels: 

Digital Schemes 

Jayanth Nayak, Ertem Tuncel, Deniz Giindiiz 
Abstract 

This paper addresses lossy transmission of a common source over a broadcast channel when there is correlated side information 
at the receivers, with emphasis on the quadratic Gaussian and binary Hamming cases. A digital scheme that combines ideas from 
the lossless version of the problem, i.e., Slepian-Wolf coding over broadcast channels, and dirty paper coding, is presented and 
analyzed. This scheme uses layered coding where the common layer information is intended for both receivers and the refinement 
information is destined only for one receiver. For the quadratic Gaussian case, a quantity characterizing the overall quality of each 
receiver is identified in terms of channel and side information parameters. It is shown that it is more advantageous to send the 
refinement information to the receiver with "better" overall quality. In the case where all receivers have the same overall quality, 
the presented scheme becomes optimal. Unlike its lossless counterpart, however, the problem eludes a complete characterization. 

I. Introduction 

Consider a sensor network of K + 1 nodes taking periodic measurements of a common phenomenon. We study the 
communication scenario in which one of the sensors is required to transmit its measurements to the other K nodes over 
a broadcast channel. The receiver nodes are themselves equipped with side information unavailable to the sender, e.g., 
measurements correlated with the sender's data. This scenario, which is depicted in Figure Q] can be of interest either by 
itself or as part of a larger scheme where all nodes are required to broadcast their measurements to all the other nodes. Finding 
the capacity of a broadcast channel is a longstanding open problem, and thus, limitations of using separate source and channel 
codes in this scenario may never be fully understood. In contrast, a very simple joint source-channel coding strategy is optimal 
for the special case of lossless coding [19]. More specifically, it was shown in [19] that in Slepian-Wolf coding over broadcast 
channels (SWBC), as the lossless case was referred to, for a given source X, side information Y\, . . . ,Yk, and a broadcast 
channel p Vl y K ^ u , lossless transmission (in the Shannon sense) is possible with k channel uses per source symbol if and only 
if there exists a channel input distribution U such that 

H(X\Y k ) < Kl(U;V k ) (1) 

for k = 1, . . . , K. In the optimal coding strategy, every typical source word X n (i) is randomly mapped to a channel codeword 
U m (i), where n and m are so that n = — , If (Q~|) is satisfied, there exists a channel codebook such that with high probability, 
there is a unique index i for which X n (i) is jointly typical with the side information Y k n and U m (i) is jointly typical with 
the channel output V k rn simultaneously, at any receiver k. This result exhibits some striking features which are worth repeating 
here. 

(i) The optimal coding scheme is not separable in the classical sense, but consists of separate components that perform 
source and channel coding in a broader sense. This results in the separation of source and channel variables as in (|T). 
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Fig. 1. Block diagram for Wyner-Ziv coding over broadcast channels. 



(ii) If the broadcast channel is such that the same input distribution achieves capacity for all individual channels, then ([T]) 
implies that one can utilize all channels at full capacity. Binary symmetric channels and Gaussian channels are the widely 
known examples of this phenomenon. 

(iii) The optimal coding scheme does not explicitly involve binning, which is commonly used in network information theory. 
Instead, with the simple coding strategy of [19], each channel can be thought of as performing its own binning. More 
specifically, the channel output V™ at each receiver can be viewed as corresponding to a virtual birQ containing all 
source words X n (i) that map to channel codewords U m (i) jointly typical with V™. In general, the virtual bins can 
overlap and correct decoding is guaranteed by the size of the bins, which is about 2 n ^^ X;Yk ^~ e \ 

In this paper, we consider the general lossy coding problem in which the reconstruction of the source at the receivers 
need not be perfect. We shall refer to this problem setup as Wyner-Ziv coding over broadcast channels (WZBC). We present 
a coding scheme for this scenario and analyze its performance in the quadratic Gaussian and binary Hamming cases. This 
scheme uses ideas from SWBC [19] and dirty paper coding (DPC) [3], [6] as a starting point. The SWBC scheme is modified 
a) to allow quantization of the source, and b) to handle channel state information (CSI) at the encoder by using DPC. The 
modification with DPC is then employed in a layered transmission scheme with K — 2 receivers, where there is common 
layer (CL) information destined for both receivers and refinement layer (RL) information meant for only one of the receivers. 
The channel codewords corresponding to the two layers are superposed and the resultant interference is mitigated using DPC. 
We shall briefly discuss other possible layered schemes obtained by varying the encoding and the decoding orders of the two 
layers and using successive coding or DPC to counteract the interference, although for the bandwidth matched Gaussian and 
binary Hamming cases, we observe that these variants perform worse. 

'The bins can also be viewed as exponentially sized lists and a similar strategy that interprets the decoding as the intersection of exponentially sized 
lists was derived independently in [10] and [19]. Another alternative binning-based coding scheme that achieves the same performance using block Markov 
encoding and backward decoding can be found in [7]. 
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DPC is used in this work in a manner quite different from the way it was used in [2], which concentrated on sending private 
information to each receiver in a broadcast channel setting, where the information that forms the CSI and the information that 
is dirty paper coded are meant for different receivers. Therefore, although the DPC auxiliary codewords are decoded at one of 
the receivers, unlike in our scheme, this is of no use to that receiver. For our problem, this difference leads to an additional 
interplay in the choice of channel random variables. The DPC techniques in this work are most similar to those in [16], [20], 
where, as in our scheme, the CSI carries information about the source and hence decoding the DPC auxiliary codeword helps 
improve the performance. However, our results indicate a unique feature of DPC in the framework of WZBC. In particular, in 
our layered scheme, the optimal Costa parameter for the quadratic Gaussian problem turns out to be either or 1 . When it is 
0, there is effectively no DPC, and when it is 1, the auxiliary codeword is identical to the channel input corrupted by the CSI. 
To the best of our knowledge, although the latter choice is optimal for binary symmetric channels, it has never been shown to 
be optimal for a Gaussian channel in a scenario considered before. 

When an appropriately defined "combined" channel and side information quality is constant at each receiver, the new scheme 
is shown to be optimal in the quadratic Gaussian case. We also derive conditions for the same phenomenon to occur in the 
binary Hamming case, although the expressions are not as elegant as in the quadratic Gaussian problem. Unlike in [19], 
however, the scheme that we derive is not always optimal. A simple alternative approach is to separate the source and channel 
coding. Both Gaussian and binary symmetric broadcast channels are degraded. Hence their capacity regions are known [4] and 
further, there is no loss of optimality in confining ourselves to two layer source coding schemes. The corresponding source and 
side information pairs are also degraded. Although a full characterization of the rate-distortion performance is available for the 
quadratic Gaussian case [17], only a partial characterization is available for the binary Hamming problem [15], [17]. In any 
case, we obtain an achievable distortion tradeoff of separate source and channel coding by combining the known rate-distortion 
results with the capacity results. For the quadratic Gaussian problem, we show that our scheme always performs at least as 
well as separate coding. The same phenomenon is numerically observed for the binary Hamming case. 

For the two examples we consider, a second alternative is uncoded transmission if there is no bandwidth expansion or 
compression. This scheme is optimal in the absence of side information at the receivers in both the quadratic Gaussian and 
binary Hamming cases. However, in the presence of side information, the optimality may break down. We show that, depending 
on the quality of the side information, our scheme can indeed outperform uncoded transmission as well. In particular, if the 
combined quality criterion chooses the worse channel as the refinement receiver (because it has much better side information), 
then our layered scheme outperforms uncoded transmission for the quadratic Gaussian problem. 

The paper is organized as follows. In Section lU we formally define the problem and present relevant past work. Our main 
results are presented in Section [III] and Section HVl namely the extensions of the scheme in [19] that we develop for the lossy 
scenario. We then analyze a layered scheme in particular for the quadratic Gaussian and binary Hamming cases in Sections [V] 
and IVI1 respectively. For these cases, we compare the derived schemes with separate source and channel coding, and with 
uncoded transmission. Section I VIII concludes the paper by summarizing the results and pointing to future work. 



II. Background and Notation 
Let (A", Y]_, . . . , Y K ) e X x y 1 x • • • x y K be random variables denoting a source with independent and identically distributed 
(i.i.d.) realizations. Source X is to be transmitted over a memoryless broadcast channel defined by Pv!---v K \u{ v ii ■ ■ ■ t v k\u) 1 
u eW,Dfc e Vfe, k = 1, . . . K. Decoder k has access to side information Yk in addition to the channel output Vk- Let single-letter 
distortion measures dk ■ X x Xk — > [0, oo) be defined at each receiver, i.e., 

1 " 

= - y~]dk{xj,Xki) 

for fc = 1, . . . , isT. 

Definition 1: An (m, n,(f>,ip\, . . . , ipx) code consists of an encoder 

: Af" -» W m 

and decoders at each receiver 

The rate of the code is k = ^ channel uses per source symbol. 

Definition 2: A distortion tuple (£>i, . . . , Dk) is said to be achievable at a rational rate k if for every e > 0, there exists 
n such that for all integers m > 0, n > n with ^ = k, there exists an (m, n, </>, V'l, • • • , tpx) c °de satisfying 

1, 



-E 

n 



dk(X n ,X%) 



<D k + e 



where X^ = i>k(V k m ,Y k n ) and V k m denotes the channel output corresponding to 4>{X n ). 

In this paper, we present some general WZBC techniques and derive the corresponding achievable distortion regions. We 
study the performance of these techniques for the following cases. 

• Quadratic Gaussian: All source and channel variables are real-valued, and we use the notation A to denote the variance 
of any Gaussian random variable A. The source and side information are jointly Gaussian and the channels are additive 
white Gaussian, i.e., Vk = U + Wk where Wk is Gaussian and Wk is independent of U. There is an input power constraint 
on the channel: 



TO . 



where U m — <j>(X n ). Without loss of generality, we assume that X = Yi = • • • = Yk = 1 and Yk = pkX + Nk with 
Nk J- X and p k > 0. Thus, N fc = 1 — p\, denotes the mean squared-error in estimating X from Y k , or equivalently, Y k 
from X since X = Yk- Reconstruction quality is measured by squared-error distance: dk{x,Xk) = (x — Xk) 2 ■ 
Binary Hamming: All source and channel alphabets are binary. The source is Ber(|), where Ber(e) denotes the Bernoulli 
distribution with P[l] = e. The channels are binary symmetric with transition probabilities pk, i.e., Vk — Uk ® Wk where 
Wk ~ Ber(pfe) and Wk and Uk are independent with © denoting modulo 2 addition (or the XOR operation). The side 
information sequences at the receivers are also noisy versions of the source corrupted by passage through virtual binary 



symmetric channels; that is, Yk — Xk ® Nk with Nk ~ Ber(/3fe) and Nk and Xk are independent. Reconstruction quality 

is measured by Hamming distance: dk(x,£k) — x ik- 
The problems considered in [9], [13], [19] can all be seen as special cases of the WZBC problem. However, the quadratic 
Gaussian and the binary Hamming cases with non-trivial side information have never, to our knowledge, been analyzed before. 
Nevertheless, separate source and channel coding and uncoded transmission are obvious strategies. We shall evaluate the 
performance of these alternative strategies and present numerical comparisons with our proposed scheme. 

A. Wyner-Ziv Coding over Point-to-Point Channels 

Before analyzing the WZBC problem in depth, we shall briefly discuss known results for Wyner-Ziv coding over a point- 
to-point channel, i.e., the case K = 1. Since K = 1, we shall drop the subscripts that relate to the receiver. The Wyner-Ziv 
rate-distortion performance is characterized in [22] as 

D WZ (R) = min E [d(X, g(Z, Y))] . (2) 

Z,g :Y - X - Z 
I{X-Z\Y) < R 

where Z E Z is an auxiliary random variable, and the capacity of the channel py\u is well-known (cf. [4]) to be 

C = maxI(U;V) . 

u 

It is then straightforward to conclude that combining separate source and channel codes yields the distortion 

D = D wz {kC). (3) 

On the other hand, a converse result in [14] shows that even by using joint source-channel codes, one cannot improve the 
distortion performance further than (0. 

We are further interested in the evaluation of D WZ (R), as well as in the test channels achieving it, for the quadratic Gaussian 
and binary Hamming cases. We will use similar test channels in our WZBC schemes. 

1) Quadratic Gaussian: It was shown in [21] that the optimal backward test channel is given by 



X = Z + S 

a 



where Z and S are independent Gaussians. For the rate we have 

R > I(X; Z\Y) = i log M - N + . (4) 
The optimal reconstruction is a linear estimate g(Z, Y) = z(i Z ^z) ^ + z^-^z) wn i cn yields the distortion 

E[d(X,g(Z,Y))}= (5) 

and therefore, 

D WZ (R) = N2~ 2fl . (6) 

2 A11 logarithms are base 2. 



2) Binary Hamming: It was implicitly shown in [22] that the optimal auxiliary random variable ZeZ = {0,l,A}is given 

by 

Z = Eo{X®S) 

where X, E, S are all independent, E and S are Ber(q) and Ber(a) with < q < 1 and < a < h, respectively, and o is an 



erasure operator, i.e., 



This choice results in 



6 = 



A a = 
b a = l 



where 



I(X;Z\Y)=qr(a,p) 



r(a,(3)=H 2 (a*/3)-H 2 (a) 



(7) 



with * denoting the binary convolution, i.e., a* b = (1 — a)b + a(l — 6), and H 2 denoting the binary entropy function, i.e., 

H 2 (p) = -plogp- (1 -p) log(l -p). 

It is easy to show that when < a, (3 < i, r(a, [3) is increasing in (3 and decreasing in a. 

Since E[d(X, g(Z, Y))] = Pr[X ^ g(Z, Y))] and X ~Ber(i), the corresponding optimal reconstruction function g boils 
down to a maximum likelihood estimator given by 

g{z,y) = argmax p Y z\x{v-,A x ) 

X 1 

= argmax Pz\x{z\ x )Pv\x(y\x) 

X 

y z = A or z = y 
z z =/= \, z =/= y and f3 > a ■ 
y z 7^ A, z 7^ y and (3 < a 



The resultant distortion is given by 



E[d(X, g(Z, Y))] = q min{a, 0} + (1 - q)f3 



implying together with (Q that 



D WZ (R) 



< q< 1,0 < a< (3 
qr(a,l3) < R 



qa + (1 — q)(3 



(8) 



(9) 



where the extra constraint a < j3 is imposed because a > (3 is a provably suboptimal choice. It also follows from the discussion 
in [22] that there exists a critical rate Rq((3) above which the optimal test channel assumes q = 1 and < a < ao((3) < (3, 
and below which it assumes a = a ((3) and < q < 1. The reason why we discussed other values of (q,a) above is because 
we will use the test channel in its most general form in all WZBC schemes. 



B. A Trivial Converse for the WZBC Problem 

At each terminal, no WZBC scheme can achieve a distortion less than the minimum distortion achievable by ignoring the 
other terminals. Thus, 

D k > DY Z {nC k ) (10) 

where C\ is the capacity of channel k. For the source-channel pairs we consider, (TTOb can be further specialized. For the 
quadratic Gaussian case, we obtain using (JSJ and 



that 



N 



For the binary Hamming case, using (0 and Cfe = 1 — H 2 (pk), the converse becomes 

Dk > min qa + (1 - q)j3k- 

< q < 1,0 < a < (3 k : 
qr(a,f3)<K[l-H 2 (p k )} 

C. Separate Source and Channel Coding 

For a general source and channel pair, the source and channel coding problems are extremely challenging. The set of all 
achievable rate triples (common and two private rates) for general broadcast channels are not known. The corresponding 
source coding problem has not been explicitly considered in previous work either. But there is considerable simplification in 
the quadratic Gaussian and binary Hamming cases since the channel and the side information are degraded in both cases: we 
can assume that one of the two Markov chains, U — V\ — V2 or U — V2 — V\, holds (for arbitrary channel input U) for the 
channel, and similarly either X — Y\ — Y2 or X — Y2 — Y\ holds for the source. The capacity region for degraded broadcast 
channels is fully known. In fact, since any information sent to the weaker channel can be decoded by the stronger channel, we 
can assume that no private information is sent to the weaker channel. As a result, two layer source coding, which has been 
considered in [15], [17], [18], is sufficiently general. 

To be able to analyze U — V\ — V2 and U — V 2 — Vi simultaneously, we denote the random variables, rates, and distortion 
levels associated with the good channel by the subscript g and those associated with the 6ad one by b, i.e., the channel variables 
always satisfy U — V g — Vb where g is either 1 or 2 and b takes the other value. Let C(k) denote the capacity region for k 
channel uses, i.e., the region of all pairs of total rates that can be simultaneously decoded by each receiver. As shown in [1], 
[5], C(k) is the convex closure of all R g ) such that there exist a channel input U 6 U and an auxiliary random variable 
Ub £ Ub satisfying U}, — U — V g — Vb, the power constraint (if any) E[U 2 } < P, and 

Rb < nI(U b ;V b ) (12) 
R g < K[I(U b ;V b )+I(U;V g \U b )]. (13) 



Let TZ(Db, Dg) be the set of total rates that must be sent to each source decoder to enable the receivers to reconstruct the 
source within the respective distortions Db and D g . A distortion pair (Db,D g ) is achievable by separate source and channel 
coding with k channel uses per source symbol if and only if 

n(D b ,D g )nc( K )^9 . 

Note that we use cumulative rates at the good receiver. 

Despite the simplification brought by degraded side information, there is no known complete single-letter characterization 
of lZ(Db, D g ) for all sources and distortion measures when X — Yj — Y g . Let TZ*(Db, D g ) be defined as the convex closure of 
all (Ri,, R g ) such that there exist source auxiliary random variables (Zf,, Z g ) S Zb x Z g with either (Yj,, Y g ) — X — Zb — Z g 
or (Yf,, Y g ) — X — Z g — Zb, and reconstruction functions g k : Z k x y k — ► X satisfying 

E[d k (X,g k (Z k ,Y k ))]<D k (14) 

for k = b,g, and 

R b > I{X;Zb\Yb) (15) 
> il(X;Zb\Yb) + [I(X:Zg\Yg)-I(X;Zb\Yg)]+ i£X-Y g -Y b 
9 ~ \l{X-Zg\Yg) + [I(X-Zb\Yb)-I(X-Zg\Yb)]+ if X - Y b - Y g " 

It was shown in [15] that lZ(Db,D g ) = lZ*(Db, D g ) when X — Y g — Yb- On the other hand, [17] showed that even when 

X — Yb — Y g , lZ(Db,D g ) = lZ*(Db, D g ) for the quadratic Gaussian problem. For all other sources and distortion measures, 

we only know lZ(Db, D g ) D lZ*(Db, D g ) in general when X — Yb — Y g . We shall present explicit expressions for the complete 

tradeoff in the quadratic Gaussian case in Section [V] and an achievable tradeoff for the binary Hamming case in Section |VT] 

D. Uncoded Transmission 

In the bandwidth-matched case, i.e., when k = 1, if the source and channel alphabets are compatible, uncoded transmission 
is a possible strategy. For the quadratic Gaussian case, the distortion achieved by uncoded transmission is given by 

for k — 1,2. This, in turn, is also because the channel is the same as the test channel up to a scaling factor. More specifically, 
when \f~PX is transmitted and corrupted by noise W k , one can write X = Z k + S k with S k _L Z k , where Z k is an appropriately 
scaled version of the received signal y/PX + W k and 

s Wk 

k w fc + p ■ 

Substituting this into (O then yields ( TTTl ). Comparing with (fTTT i. we note that ([T7| > achieves D\ vz {C k ) only when N k = 1 or 
when Wfc — * oo, which, in turn, translate to trivial Y k or zero C k , respectively. 
For the binary Hamming case, this strategy achieves the distortion pair 

D k =mm{ Pk ,(3 k } (18) 
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for k — 1,2. That is because the channel is the same as the test channel that achieves D WZ (R) with q = 1. The distortion 
expression in (fT~8T > then follows using ((HJ. One can also show that < fT~8T > coincides with D^ 2 (C k ) only when (3 k = \ or p k = ^. 
Once again, these respectively correspond to trivial Y k and zero C k . 



In this section, we present the basic coding schemes that we shall then develop into the schemes that form the main 
contributions of this paper. In what follows, we only present code constructions for discrete sources and channels. The 
constructions can be extended to the continuous case in the usual manner. Our coding arguments rely heavily on the notion 
of typicality. Given a random variable X ~ Px(x), defined over a discrete alphabet X the typical set at block length n is 
defined as [11] 



where N(a\x n ) denotes the number of times a appears in x n . 

The first scheme, termed Common Description Scheme (CDS), is a basic extension of the scheme in [19] where the source 
is first quantized before transmission over the channel. Even though our layered schemes are constructed for the case of 
K = 2 receivers, CDS can be utilized for any K > 2. Unlike in [19], where typical source words are placed in one-to-one 
correspondence with a channel codebook, the source words are first mapped to quantized versions and it is these quantized 
versions that are mapped to the channel codebook. Like [19], there is no explicit binning, but the channel performs virtual 
binning. Before discussing the performance of the CDS, we shall present an extension of the CDS for a more general coding 
problem. 

Suppose that there is CSI available solely at the encoder, i.e., the broadcast channel is defined by the transition probability 
Pv r 1 v a |c/'s(ui,V2|ti, s) and the CSI S m £ T™(S) with some -q > 0, where S is some fixed distribution defined on the CSI 
alphabet S, is available non-causally at the encoder. Given a source and side information at the decoders (X, Y\,Y2), codes 
(m, n,4>, ipi,ip2) an d achievability of distortion pairs is defined as in the WZBC scenario except that the encoder now takes 
the form <\> : X n x S m — ► U m . The following theorem characterizes the performance of an extension of the CDS, which we 
term CDS with DPC. 

Theorem 1: A distortion pair (D\, . . . , Dk) is achievable at rate k if there exist random variables Z £ Z, T £ T,U £ U 
and functions g k : Z x y k -> X with (Y x , Yk) — X — Z and T - (U, S) -(Vj_,..., V K ) such that 



for k = l,...,K. 
Proof: 

The code construction is as follows. For fixed 6, 5', 5" > 0, a source codebook Cz = {z n (i),i = 1,...,M} is chosen 
from Tg l (Z). A set of M bins Cr(i) = {t m (i,j),j = 1,...,M'}, where each t m (i,j) is chosen randomly at uniform 



III. Basic WZBC Schemes 




I(X; Z\Y k ) < k[J(T; V k ) - I(T; S)] 



(19) 



E[d k (X,g k (Z,Y k ))]<D k 



(20) 
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from Tj m (T), is also constructed. Given a source word X n and CSI S m , the encoder tries to find a pair such that 

(X n ,z n (i*)) £ T£{X,Z) and (S m ,t m (i*,j*)) e T^{S,T). If it is unsuccessful, it declares an error. If it is successful, 
the channel input is drawn from the distribution Y[iLiPu\Ts( u i\U(i* , j*), Si). At terminal k, the decoder goes through all 
pairs G {1, . . . , M} x {1, . . . , M'} until it finds the first pair satisfying {Y k n , z n (i)) £ T£,(Y k , Z) and (V k m , t m (i, j)) £ 
TfiJ?(Vk,T) simultaneously. If there is no such pair, the decoder sets i = l,j = 1. Once is decided, coordinate-wise 

reconstruction is performed using g k with Y k n and z n {i). 
We define the error events as 



£i = either (X",^(i))^T,?(X,Z) or (S m ,t m ^, j)) ? T^(S,T) 



£a(k) = ^(V k m ,t m (e,f))?Tp(V k ,T) 

£,(k) = (3(i^i*,j), {Y k \z n {t))eT£,{Y kl Z) and (V k m ,t m (i,j))eT s V(V k ,T) 
Using standard typicality arguments, it can be shown that for fixed 5,6', 5", if 

M > 2 n[I{X;Z)+e 1 (5,5> ,8")] 

and 

M l > 2 mlI(.S;T)+e 1 (5,5',5")] 

then Pr[£i] < e, and that Pr[£2(fc)] < e and Pr[£3(fc)] < e for any e > and large enough n. Similarly, it follows that if 

M < 2 n[I(X;Z)+2e 1 (S,S',S")] 

and 

M i < 2m[I(S;T)+2ei(5,5' ,5")] 
Pr[£ 4 (fc)] < M ■ M' ■ 2- n ^ I{ - Yk ' Z ^- C2i - 5 ' 5 ' \S")] 2 -m[I(T-V k )-e 2 (8,8' \5")] 

= M ■ M' ■ 2-"[ / ( y '«' Z )+ K/ (' r ' V ^)-( K + 1 ) c 2('5^'^")] 

< 2 n[I(X;Z\Y k )-K{I(T;V k )-I(S;T)} + ( K ,+ l)e 2 (S,8',8")+2(K+l)e 1 {8,S',8")] 

This probability also vanishes if 5, 5', 5" — > oo thanks to ( fT9b . This completes the proof. ■ 
Note that, if S is a trivial random variable, independent of the channel, the scenario becomes the original WZBC setup and 
CDS with DPC becomes CDS. By equating T and U, we obtain the following corollary that characterizes the performance of 
the CDS. 



then 
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Corollary 1: A distortion tuple [D\, . . . . Dk) is achievable at rate k for the WZBC problem if there exist random variables 
Z e Z, U e U and functions g k : Z x y k — > X k with (Yi, ... , Y K ) - X - Z such that 

I(X;Z\Y k ) < K I(U;V k ) (21) 
E[d k (X,g k (Z,Y k ))]<D k (22) 

for k = l,...,K. 

Corollary 2: The coding scheme in the proof of Theorem Q] can also decode t m (i* ,j*) successfully. 
Proof: Define 

em = (i3*r, (vr,H**,j)) e i^(y k ,r^ . 

It then suffices to show that Pr[^5(fc)] < e for large enough n. Indeed, since I(T; V k ) — I(S;T) > 0, 

Pr[£ 5 (k)] < M'2- m WT;i4)-c2(<M\<5")] 

< 2-™[ I ( T -< V k)-I(S;T)-e 2 (6,S',6")-2e 1 {5,5',6")] 

< e . 

The assumption 7(T; Vfc) — 1(5; T) > is not restrictive at all, because otherwise no information can be delivered to terminal 
k to begin with. ■ 

The significance of Corollary [2] is that decoding t m (i*,j*) provides information about the CSI S m . This information, in 
turn, will be very useful in our layered WZBC schemes where the CSI is self-imposed and related to the source X n itself. 

Examining the proof of Theorem Q] we notice an apparent separation between source and channel coding in that the source 
and channel codebooks are independently chosen. Furthermore, successful transmission is possible as long as the source coding 
rate for each terminal is less than the corresponding channel coding rate for a common channel input. However, the decoding 
must be jointly performed and neither scheme can be split into separate stand-alone source and channel codes. Nevertheless, 
due to the quasi-independence of the source and channel codebooks we shall refer to source codes and channel codes separately 
when we discuss layered WZBC schemes. This quasi-separation was shown to be optimal for the SWBC problem and was 
termed operational separation in [19]. 

IV. A Layered WZBC Scheme 

In this section, we focus on the case of K = 2 receivers. In CDS, the same information is conveyed to both receivers. 
However, since the side information and channel characteristics at the two receiving terminals can be very different, we might 
be able to improve the performance by layered coding, i.e., by not only transmitting a common layer (CL) to both receivers 
but also additionally transmitting a refinement layer (RL) to one of the two receivers. The resultant interference between the 
CL and RL can then be mitigated by successive decoding or by dirty paper encoding. Since there are two receivers, we are 
focusing on coding with only two layers because intuitively, more layers targeted for the same receiver can only degrade the 
performance. 
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Unless the better channel also has access to better side information, it is not straightforward to decide which receiver should 
receive only the CL and which should additionally receive the RL. We shall therefore refer to the decoders as the CL decoder 
and the RL decoder (which necessarily also decodes the CL) instead of using the subscripts 1 and 2. For the quadratic Gaussian 
problem, we will later develop an analytical decision tool. For all other sources and channels, one can combine the distortion 
regions resulting from the two choices, namely, CL decoder = 1 and RL decoder = 2 and vice versa. For ease of exposition, 
for a given choice of CL and RL decoders, we also rename the source and channel random variables by replacing the subscripts 
1 and 2 by c (for random variables corresponding to the CL information or to the CL decoder) and r (for random variables 
corresponding to the RL information or to the receiver that decodes both CL and RL). 

As mentioned earlier, the inclusion of an RL codeword changes the effective channel observed while decoding the CL. It is 
on this modified channel that we send the CL using CDS or CDS with DPC, and the respective channel rate expressions in 
(|2TT > and (fT9b must be modified in a manner that we describe in the following subsections where we also present the capacity 
of the effective channel for transmitting the RL. Each possible order of channel encoding and decoding (at the RL decoder) 
leads to a different scheme. We shall concentrate on the scheme that has the best performance among the four in the Gaussian 
and binary Hamming cases, deferring a discussion of the other three to Appendix [A] In this scheme, illustrated in Figure [2] 
the CL is coded using CDS with DPC with the RL codeword acting as CSI. We shall refer to this scheme as the Layered 
Description Scheme (LDS). We characterize the source and channel coding rates for LDS in the following. We will only sketch 
the proofs of the theorems, as they rely only on CDS with DPC, and other standard tools. 

A. Source Coding Rates for LDS 

The RL is transmitted by separate source and channel coding. In coding the source, we restrict our attention to systems 
where the communicated information satisfies (Y c , Y r ) — X — Z r — Z c where Z c corresponds to the CL and Z r is the RL. 
The source coding rate for the RL is therefore I(X; Z r \Z c ,Y r ) (cf. [17]). This has to be less than the RL capacity. Due to 
the separability of the source and channel variables in the required inequalities we can say that a distortion pair (D C7 D r ) is 
achievable if 



Here, C\vzbc( k ) is the "capacity" region achieved by either LDS or any of its variations discussed in Appendix lAl and 
7?.£ DS (-D C , D r ) is the set of all triplets {R s cc , R s cr , R^r) so that there exist (Z c , Z r ) and reconstruction functions g c : Z c x y c — > 
X c and g r : Z r x y r — > X r satisfying (Y c , Y r ) — X — Z r — Z c and 



Ki DS (D c ,D r )nC WZBC {K) ^0 . 



I(X;Z C \Y C ) < R s c , 



'cc 



(23) 



I{X;Z c \Y r ) < R s c , 



cr 



(24) 



I(X;Z r \Z c ,Y r ) < R° 



(25) 



E{d c {X,g c (Z Cl Y c ))] < D c 



(26) 
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CHANNEL ENCODER 
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REFINEMENT DECODER 
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CHANNEL ► 



j' REFINEMENT 



Fig. 2. Components of LDS: Z™(i) and Z™(j\i) are the first and second stage quantized source words. Z"(j\i) is binned and the bin index j' is channel 
coded to U™(j') in the usual sense. Z"(i), on the other hand, is mapped to U™(i) using CDS with DPC, where C™(j') serves as the CSI. The two 
channel codewords are then superposed, resulting in U m . Decoding of Z"(i) is exactly as in CDS with DPC at both receivers. In decoding of Z"(j\i), the 
refinement channel decoder makes use of both the channel output V™ and the auxiliary code word T m to decode the bin index j'. 



E[d r {X,g r {Z r ,Y r ))]<D r 



(27) 



The subscripts cc and cr are used to emphasize transmission of the CL to receivers c and r, respectively. Similarly, the subscript 
rr refers to transmission of RL to receiver r. 

B. Channel Coding Rates for LDS 

The next theorem provides the effective channel rate region for LDS. 

Theorem 2: Let 7^ ds (k) be the union of all (R c cc , R° cr , R c rr ) for which there exist U c G U c , U r <G U r , and T £ T with 

T - (U r , U c ) - (V r , V c ) and (U r , U c ) - U - (V r ,V c ) such that 



R c cc <n[I(T;V c )-I(T;U r )] 
R c cr <K[I(T:V r )-I(T;U r )] 
R c rr <K[I(U r ;T,V r )} . 



(28) 
(29) 
(30) 



Then 7££ ds (k) C C W zbc(k)- 
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Remark 1: The various random variables that appear in Theorem [2] have the following interpretation: V c and V r are the 
channel outputs when the input is U. U c and U r correspond to the partial channel codewords that are superposed to form the 
channel input. Finally T is the auxiliary random variable used in DPC with U r forming the CSI. 

Remark 2: In LDS, a trivial U r together with T = U reduces to CDS. 

Proof: We construct an RL codebook with elements from T™(U r ). We then use the CDS with DPC construction with the 
chosen RL codeword acting as CSI. It follows from Theorem Q] that the CL information can be successfully decoded (together 
with the auxiliary codeword T m ) at both receivers if d28i l and ( f29T > are satisfied. This way, the effective communication system 
for transmission of RL becomes a channel with XJ™ as input and the pair T rn and V™ as output. For reliable transmission, 
(l30b is then sufficient. ■ 



In this section, we analyze the distortion tradeoff of the LDS for the quadratic Gaussian case. While CDS with DPC is 
developed only as a tool to be used in layered WZBC codes, CDS itself is a legitimate WZBC strategy. We thus analyze its 
performance in some detail first before proceeding with LDS. It turns out, somewhat surprisingly, that CDS may in fact be the 
optimal strategy for an infinite family of source and channel parameters. Understanding the performance of CDS also gives 
insight into which receiver should be chosen as receiver c, and which one as receiver r. We remind the reader that the variance 
of a Gaussian random variable A will be denoted by A. 

A. CDS for the Quadratic Gaussian Problem 



Using the test channel X = Z + S with Gaussian S and Z where S _L Z, and a Gaussian channel input U, (fJTJ becomes 
(cf. ©) 



By analyzing ([5), it is clear that S should be chosen so as to achieve the above inequality with equality. Substituting that 
choice in (O yields 



V. Performance Analysis for the Quadratic Gaussian Problem 




for k = 1, . . . , K, In other words, 




k 




k 



k 



k' 



(31) 



For all fc* that achieve the minimum in A3 It , we have 



1 




N fc . 



Thus, as seen from ( fTTT ). D^* — D^ z (kCj,*). This, in particular, means that if 




N fe 
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is a constant, CDS achieves the trivial converse and there is no need for a layered WZBC scheme. Specialization of ( f3TT > to 
the case k = 1 is also of interest: 

± = ± + P (32) 

D k N fc maxfe- {W fe ,N fe ,} ' 

In particular, all k* maximizing Wfc*Nfe* achieve D^* — D^ z (Cfe»). Thus, the trivial converse is achieved if W^Nfe is a 

constant. 

B. LDS for the Quadratic Gaussian Problem 

For LDS, we begin by analyzing the channel coding performance and then the source coding performance in terms of 
achievable channel rates. Then closely examining the channel rate regions, we determine whether c — 1, r — 2, or c = 2, r = 1 
is more advantageous given k, P, Ni, N2, Wi, and W2. The resultant expression when k = 1 exhibits an interesting 
phenomenon which we will make use of in deriving closed form expressions for the (D c , D r ) tradeoff in LDS. 

1) Channel Coding Performance: For LDS, we choose channel variables U c and U r as independent zero-mean Gaussians 
with variances vP and vP, respectively, with < v < 1, and use the superposition rule U = U c + U r . Motivated by Costa's 
construction for the auxiliary random variable T, we set T = jU r + U c . Using d28|l-(|30ll, we obtain achievable (R^ c , R^ r , R^ r ) 
as 

Kc = IhU r + U c ;U c + U r + W c )-I(U r ;jU r + U c ) 

= h(U c + U r + W c ) + h{U c ) - h{jU r + U e , U c + U r + W c ) 
1, [P + W c }vP 

- lo s — — 



2 

det 



j 2 DP + vP ^/i>P + vP 
<yvP + vP P + W c 



1 1 + 

= o lo § T~> n — (33) 



R C cr = I(jU r + U c ;U c + U r +W r )-I{U r ;jU r + U c ) 



1 1 + -^ 

■ lo S — — — — (34) 

-7) 2 



1 



= I(U r ;~fU r + U c ,U c + U r + W r ) 

= h(-/U r + [/ c , E/ c + t/ r + W r ) - h(U c , U c + W r ) 
= h^Ur + U c , U c + U r + W r ) - h{U c ) - h(W r ) 
1 



det 


' 7 2 PP + vP 


jDP 


+ vP ' 


jDP + vP 


P + 





2 l0g uPW r 
= llog(l + i?p(^-+ { ^-^) ). (35) 



2 V V vP W 

Here, ( 134b follows by replacing W c with W r in ( f33l >. 

2) Source Coding Performance: We choose the auxiliary random variables so that X = Z r + S r and Z r = Z c + S' c where S r 
and are Gaussian random variables satisfying S r -L Z r and _L Z c . This choice imposes the Markov chain X — Z r — Z c , 
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and implies X = Z c + S c with S c _L Z c and 1 > S c > S r . Using ©, one can then conclude 



R s cc = ^log(l-N c + ^l (36) 



2 °V S c 

Kr = ~logfl-N r +^ < 37 > 

1 /i-N r + f-\ 
Kr = 2^{T^T%) • (38) 

For any achievable triplet (R^ c ,R^ r ,R^. r ), (f36b-(f38l> can be used to find the corresponding best (D c ,D r ). More specifically, 



(I36ll-(l3"8t and (1231-dBTl together imply 



I r 2 2k R c cc — l 2 2kR ct — 1 ' 
— < mirJ — , — \ + l (39) 



2 ^ fi - N r + &■)- 1 



Since we have from <(5j that 



s: * Nr " +1 - (40) 



N fe 



it is easy to conclude that both d39b and ( |40b should be satisfied with equality to obtain the best (D c , D r ), which becomes 

D < = irb 1421 



where 



Now, if 



(44) 



N C (45) 

then D r = N r 2- 2re ( fl - +fi °*-). But in the LDS, we have R c cr + R c rr = C r = \ log (l + implying D r = D^ z {nC r ), 
regardless of the chosen parameters. Moreover, D c will be minimized when d45l > is satisfied with equality. Thus, it suffices to 
consider only 

2 2 ^R C ac — 1 2 2KR cr — 1 

< (46) 

because equality in (|46*| | already gives D r = Df z (nC r ). We thus have 

D c = N c 2- 2kR -- (47) 
N, 



n - l_z 2 - 2kR: 

r N. ' ' - 

N. 



1+ Nnpa-SL-l]' 



1 + N r 



i i 

D € N c 



^Kr . (48) 
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3) Choosing the Refinement Receiver: Note that setting v = 1 reduces LDS to CDS. This is regardless of which receiver 
is designated as c or r. This simple observation, along with the discussion in Section IV-AI leads to the following lemma. 
Lemma 1: In order to maximize the performance of LDS, one must set c and r so that 



1 (1 + ^-1 - I 



< *r • (49) 



Remark 3: When k = 1, (|49l translates to 

W C N C > W. r N r . (50) 

Therefore, the product W^Nfe determines the combined channel and side information quality, so that the "better" receiver is 
chosen to receive the RL information. Recall from the discussion in Section [V-AI that if W^Nj. is constant, then in fact there 
is no need for refinement, as CDS already achieves the optimal performance. 

Proof: When v = 1, i.e., when all the power is allocated to the CL, LDS achieves the same performance as CDS. In 
particular, it achieves the channel rate point 

K c = c c = iiog(i + ^- 
1 / p 

Rcr = C r = 2 l0g (, 1+ W; 

R c rr = . 

If <g9} does not hold, then from OTJ, it follows that LDS also achieves D r = D^ z {K,C r ) and some D c > D^ z (kC c ). Now, 
if we set v < 1, it is obvious that D r cannot be lowered any further. We claim that D c cannot be lowered either. Therefore, 
LDS would not be able to achieve a better (D c , D r ) than what CDS achieves. On the other hand, sending the refinement to 
receiver c could potentially result in a better performance. 

Towards proving the above claim, observe from (|44| | that it suffices to show that neither R c cc nor R c cr can increase when v < 1 
compared to the case v = 1. That, in turn, follows by closely examining the expressions for R c cc and R c cr in Section |V-B. II In 
particular, for LDS, both ( T33l > and (|34| | will be maximized by their corresponding optimal Costa parameters, i.e., by 7 = ^p^y 
and by 7 = ^p^j , respectively. This results in R^ — ^ log ^1 + an d R C CT = \ 1°6 (l + W~) as tne max i mum possible 
values, which are strictly smaller than C c and C r , respectively. Therefore, the proof is complete. ■ 

C. Performance Comparisons for the Bandwidth Matched Case: k = 1 
We first derive the closed-form (D c ,D r ) tradeoff for LDS. 

Lemma 2: A distortion pair (D c ,D r ) is achievable using LDS if and only if D r > Z?lds (£*<:), where -Dlds(-Dc) is the 
convex hull of 

l\r TV 2 f w r Dc W > W 

n* (r>\- 1Vrl ^c J (W r -W C )N C + (P+W C )_D C vv c ^vv r 

W,_ D c N c + N r (N c -D c ) W C <W„ (51) 



for 



N W 

c c <D r <D™* 



P + W c 
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with 

(P+W c )(N r -N c ) / 1N C ^ l> r , vv c ^ vv r 

N c >N r ,W c >W r ■ (52) 

, f(W c N e -W r N r ) 

V P+W c 1 (P+W c )(N, 7 -N r )W r c rj " c ^ vv r 

Remark 4: The cases N c < N r , W c < W r and N c < N r , W c = W r are not considered in ( 1521 ) because they are 
prohibited by the rule ( T50b . The same r ule also guarantees < D" ax < N c . 

As a byproduct of the proof, which is deferred to Appendix [B] we observe that the Costa parameter 7 is either or 1, 
depending on whether W c > W r or W c < W r , respectively. When it is 0, we have T = U c . On the other hand, when 7=1, 
we have T = U = U c + U r . Thus, setting the auxiliary codeword T m to be the same as the channel input U m constitutes 
the optimal choice. To the best of our knowledge, this choice, which is typically encountered in DPC for binary symmetric 
channels, has never been obtained as the optimal choice involving Gaussian channels. 

We now compare LDS with other schemes for the WZBC problem. The performance of uncoded transmission is governed 
by dl7t . The distortion trade-off of separate coding is given by the following lemma, which is proved in Appendix [C] Recall 
that the subscripts b and g refer to good and bad channels, i.e., the Markov chain U — Vg — Vf, holds for all channel inputs U. 

Lemma 3: For the quadratic Gaussian case with k = 1, the distortion pair (Db,D g ) with D^ z (Cb) < Db < N& is 
achievable using separate coding if and only if D g > DsEp{Db) where DsEP^b) is the convex hull of 

^sEpl^fc) - 7 w r (53) 

\D b -N b + N g (N b - D b )) ({Wg - W b )N b + (P + Wb)D b ) 

when X — Y g — Yb, and 

N f Nb(N g W g -(P + Wb)Db-Nb(W g -W b ))) 

Dhp&b) = t 9 ~ r max W B D b , —± >- (54) 

(W 9 -W 6 )N 6 + (P + W 6 )D 6 ) 1 N s -N b J 



when X-Y b - Y g . 

The relative performance of the various schemes will be discussed case by case. 

1) It is obvious by comparing d53l ) and ( BIT ) that when W c > W r and N c > N r , LDS obtains the exact same performance as 
in separate source and channel coding (Note that r = g, c = b in this case). The case where there is no side information, 
i.e., = N 2 = 1, falls under this category since the refinement information must go the receiver with the better 
channel. Therefore we see that the purely digital LDS is worse than the schemes analyzed in [13] in the absence of 
side information. Preliminary results from combining LDS with hybrid analog/digital schemes as in [13] were presented 
in [8]. This behavior is displayed in Figures [3jd) and (e). 

As for uncoded transmission, it can be better than the digital schemes. For example, consider the case N c = N r = 1 
depicted in Figure O e )> which corresponds to no side information at the receivers. In this case, uncoded transmission 
actually achieves the trivial converse, and therefore, is the optimal strategy. 

2) When W c > W r and N c < N r , it follows from d54l > and dBTl ) that a sufficient condition for superiority of LDS over 
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Fig. 3. Performance comparison for Gaussian sources and channels. In (a)-(e), N1W1 > N2W2, and therefore the choice c = 1, r = 2 is made. In addition, 
in (e), Ni = N2 = 1, implying that there is no side information at either receiver and hence uncoded transmission is optimal. In (f), N1W1 = N2W2 
making CDS optimal. 



2d 



separate coding is given by 



N r W r P> c N r NlW r D c 



(W r - W C )N C + (P + W C )D C fD c N c + N r (N c - D e j\ ((W r - W C )N C + (P + W C )A 
which simplifies to 

1 > 



N 



D c N c + N r (N c -£> c ) 

and is therefore granted since N c < N r . Moreover, equality is satisfied, i.e., the two schemes have equal performance, 
only when D c = £H nax = N c . This behavior is exemplified in Figures |3jb) and (c). The difference between the two 
examples is that D™ = N c in (b), whereas £)™ ax < N c in (c). 

Even though N c = N r = 1 is prohibited in this case, one can consider N c = 1 — e and N r = 1 with arbitrarily small 
e > 0. Uncoded transmission is also superior to all the digital schemes in this limiting case. 
3) Finally, when W c < W r and N c > N r , since r = b, c = g in this case, we need to explicitly write the best D c for a 
given D r for LDS. From (J5TJ, it follows that LDS can achieve 



N c N r 

c N, - N r 



1 



(55) 



_(P + W c )D r 

for D^ z ' (C r ) < D r < j^ 1 ^^^^ . On the other hand, (|54| | implies that the minimum D c that can be achieved by 
separate coding must necessarily satisfy 

N c N r f N C W C - (P + W r )D r - N r (W c - W r ) ' 



D r > 



(W c - W r )N r + (P + W r )D r ) (N c — N 
N c N r 



N c -N r 



N,W, _ 



(56) 



(W c - W r )N r + (P + W c )P> r 
Superiority of LDS over separate coding then easily follows from d55l > and ( |56l ). An example of this case is shown in 
Figure [2 a). 

We next show that LDS always outperforms uncoded transmission in this case. In fact, uncoded transmission is even 
worse than CDS. Since CDS achieves D r = D^ z (C r ), it suffices to compare the D r values. Comparing ( TP7| > and d32| ), 
this reduces to showing 

N r W r > N. r N c W c 



W r + N r P - N C W C + N r P 
or equivalently 

W r > N C W C . 

But since W r > W c , this is trivially true. 
In Figure [3jf), we also include an example where N C W C = N r W r , i.e., where the combined channel and side information 
qualities are the same. CDS achieves the trivial converse as discussed in Section IV-AI We also observed that uncoded 
transmission may achieve a distortion pair below the best known digital tradeoff, as shown in Figures [3d) and (e). This 
was expected because it is well-known that the optimal scheme is uncoded transmission when there is no side information at 
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either receiver, as is the case in Figure |3je). For cases other than N C W C = N r W r , one could roughly say that LDS is better 
than uncoded transmission when the quality of the side information is sufficiently high, although we do not currently have the 
analytical means for comparison. 

VI. Performance Analysis for the Binary Hamming Problem 

In this section, we first analyze the CDS for the binary Hamming problem and show that it can be optimal in this case as 
well. We then analyze the LDS and present numerical comparisons of the LDS with separate coding and uncoded transmission. 

A. CDS for the Binary Hamming Problem 

It follows from Corollary Q] and Equations (|7]i and ([8]) that in the binary Hamming case, if there exists < q < 1 and 
< a < \ such that 

qr(a,f3 k )<K[l-H 2 (p k )} (57) 

for all k, then 

D k = (1 - q)/3 k + q min{a, /3 fc } (58) 

can be achieved by the CDS. Unlike in the quadratic Gaussian case, the constraint (l57l i does not result in a single best value 
for q and a. Therefore, CDS produces a tradeoff of D k 's rather than one best point. 

As discussed at the end of Section Hl-AI the distortion-rate function D\ vz : {R) is achieved either by q = 1 and a < ao(f3 k ), 
or by < q < 1 and a — ao(f3 k ). The implication of this fact to the CDS is the following: 

1) If p k are not identical, neither are ao((3 k ), and thus we need q — 1 and some a < mm k cto([3 k ) to attain all D^ 2 (nC k ) 
simultaneously, i.e., 

r(a, (3 k ) = k[1 - H 2 { Pk )] (59) 

for all k. When this happens, we must necessarily have D k — a i.e., D k does not depend on k. 

2) If (3 k = (3 for k — 1, . . . , K, and thus D^ 2 (R) does not depend on k, we need C k = C (and hence p k = p) so that the 
same test channel (q,a) achieves Z?^ z (Cfc) simultaneously. But, this makes the problem trivial. 

B. LDS for the Binary Hamming Problem 

1) Source Coding Rates: To evaluate R s cc ,R s cr and R s rr , we first fix Z c and Z r with Z c = Z r = {0,1, A}, where the 
test channels are also confined to degraded versions of those that achieve D wz (R), as shown in Figure [4] for the case 
(Y c , Y r ) — X — Z r - Z c . More specifically, 

Z c = E c o{X®S c ) 

Z r = E r o(X® S r ) 
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X 



q g a g 



Fig. 4. Auxiliary random variables for binary source coding. The edge labels denotes transition probabilities. We also use the convention that 5 = 1 — a. 



where E c ,E r ,S c , and S r are all Bernoulli random variables with parameters q c ,q r ,a c , and ot r , respectively. To obtain a 
Markov relation X — Z r — Z c , it suffices to enforce q c < q r and a c > a r . In that case, one can find < q' c < 1 and 
< a' c < i such that q c = q r q' c and a c = a r * a' c , and Z c can alternatively be written as 



Z c 



E' c o [Z r e S' c ) Z r ^ A 



A 



Z r = A 



where E' c and S' c are Ber(^) and Ber(a^,), respectively. 
This results in 

Kc = q c r(a c ,/3 c ) 
Rcr = qcr(a c ,f3 r ) 
Kr = q r r(a r ,/3 r ) - q c r(a c ,f3 r ) . 

We next make channel variable choices and derive the resulting channel coding rates for CDS and LDS individually. Unlike 
in the quadratic Gaussian case, there is no power allocation parameter to vary. However, we have freedom in choosing the 
distributions of U c and U r as Ber(7 c ) and Ber(7 r ), respectively, as well as in choosing the auxiliary random variable as either 
T = U c or T = U c © U r . 

2) Channel Coding Rates: In this case, with T = U c , d73ll-(|75ll become 

R c cc = I(U c :U c (BU r OW c ) 

R c cr = I(U C ; U c ®U r ® W r ) 

= r(7 r *p r ,7 c ) 
R c rr = I(U C ®U r ]U c ®U r ® W r \U c ) 

= I{U r ]U r ®W r ) 
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But since r(-, •) is increasing in its second argument, we have j c = | as the optimal value achieving 

Kc = l-H 2 { lr * Pc ) (61) 
= l-H 2 { lr * Pr ). (62) 

On the other hand, if T = U c © U r , we obtain 

R c cc = I(U e ®U r ;U e ®U r @Wc)-I(U r ;U c @U r ) 

= ?"(Pc,7c*7r) - r(7 c ,7 r ) (63) 

i& = I([/ c ffit/ r ;[/ c ffl C^ r © W r ) - /(C/ r ; J7 C © l/ r ) 

= ?*(Pr,7c*7r) - r (7c7r) (64) 

R c rr = I(U r ;U c ®U r ,U c ®U r ®W r ) 
= I(U r ;U c ®U r ) 

= r(j c ,jr) . (65) 

C. Performance Comparisons for the Bandwidth Matched Case: k = 1. 

Analytical performance comparisons prove more difficult for the binary Hamming problem. Even the question of which 
receiver should be designated as c and which as r is not straightforward to answer. That is because (i) there is no power 
allocation parameter we can control, and (ii) even CDS can produce a curve which could achieve both D c = z (kC c ) and 
D r = D^ z (nC r ), rather than a single best point. 

It is also not clear that our choice of source random variables are the best. As mentioned earlier, our main motivation in 
adopting the same test channel as in point-to-point coding for LDS is its simplicity. The alphabet size bounds in [15], [17], 
however, are much higher and therefore it might be possible to further improve the performance of LDS. 

Using the same auxiliary random variables in separate coding gives us the following achievable result. We do not have a 
complete characterization of the distortion tradeoff. 

Lemma 4: A distortion pair D g ) is achievable if there exist variables < qt,, q g < 1 and < a&, a g < i that satisfy 

q b r(a b ,(3 b ) < n[l - H 2 (0 * p b )} , (66) 

q b r(a b ,P b ) + [q g r(a g ,f3 g ) - q b r(a b , (5 g )] + < K[H 2 (8*p g ) - H 2 {p g )\ iiX - Y g ~ Y b , (67) 

q g r(a g ,(3 g ) + [q b r(a b ,/3 b ) - q g r(a g .J%)} + < n[H 2 {8-kp g ) - H 2 (p g )} ifX -Y b -Y g . (68) 

A < Qi mirier, ft + (1 - i^{b,g} (69) 

The proof is presented in Appendix [D] 

The performance of the various schemes for certain source-channel pairs at rate k = 1 is presented in Figure [5] For LDS, 
the convex hull of two curves is shown, where in one c = 2, r = 1 and in the other c = l,r = 2. In Figures |3a)-(d), the 
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Fig. 5. Performance comparison for binary sources and channels. In (a)-(d), ft, ft), and pi are fixed, and as p2 increases, how all the schemes compare 
changes. In (e), uncoded transmission is optimal. In (f), CDS and consequently LDS is the best. It is also noteworthy that it touches both trivial converse 
bounds simultaneously. 
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parameters j3i, 02, and p% are fixed so that d59l l is satisfied for fe = 1, and p2 is varying. As p2 increases, the collective 
behavior of the schemes dramatically changes. In Figure [5j a), c = 1, r = 2 is consistently the best choice among all schemes. 
As the quality of the second channel decreases, and reaches the point where (|59l l is also satisfied for k = 2, CDS becomes 
optimal, as shown in Figure [5jb). When p2 is increased even further, as in Figure |5jc), c = 2, r = 1 becomes the better 
choice. When p2 reaches the point where the first receiver has access to both the better channel and the better side information, 
as in Figures |5jd) and (e), separate coding and LDS become identical as in the quadratic Gaussian case. However, uncoded 
transmission can still outperform the LDS as shown in Figure |3e) for the case of trivial side information. Finally, Figure Of) 
exemplifies the interesting phenomenon mentioned above, where CDS (and LDS) produces a curve, rather than a point, which 
happens to be the best. 

VII. Conclusions and Future Work 

We proposed a layered coding scheme for the WZBC problem, and analyzed its distortion performance for the quadratic 
Gaussian and binary Hamming cases. Even though our scheme allows for arbitrary rate k channel uses per source symbol, the 
achievability regions are easiest to compute for k = 1. In fact, for the quadratic Gaussian case, we were able to derive closed 
form expressions for the entire distortion tradeoff and show that our layered scheme is always at least as good as (in fact, except 
for one certain case, always better than) separate coding. By numerical comparisons, we observed the same phenomenon for 
the binary Hamming case under the regime where all the test channels are constrained to be of the form which achieves the 
Wyner-Ziv rate-distortion function. On the other hand, our scheme may not always improve over the performance of uncoded 
transmission. This is not surprising, since when there is no (or trivial) side information, it is known that uncoded transmission 
is optimal. 

In an upcoming paper, we combine the digital scheme we proposed with uncoded transmission to extract the benefits of both 
methods. In fact, as we show in a preliminary version [8], the hybrid scheme is more than the sum of its parts and distortions 
outside the convexification of the digital and analog regions are achievable. 

Appendix 

A. Other Layered WZBC Schemes 

The LDS that we focus on in this paper is only one of many possible layered coding schemes based on CDS and CDS with 
DPC. We shall briefly discuss these schemes. In all schemes, the source coding rates are the same as in LDS and only the 
channel coding rates differ. 

• Scheme 1: This scheme is the simplest extension of CDS. The CL is encoded as in CDS. The RL is encoded on top 
of the CL. At both decoders, the RL is a source of interference while decoding the CL. Once the CL is decoded at the 
refinement receiver, its effect can be cancelled while decoding the RL. The acheivable channel rates are given by the next 
theorem. 
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Theorem 3: Let TZ-i(k) be the union of all {R c cc: R c cr ,R^. r ) for which there exist U c in some auxiliary alphabet U c and 
U eU with U c -U- (V c , V r ) such that 

R c cc <kI(U c ;V c ) (70) 
R c cr < Kl(U c ;V r ) (71) 
Kr < Kl{U;V r \U c ) . (72) 

Then H\ C C W zbc(k). 

Proof: Given random variables U and C/ c such that U c —U— (V c , V r ) and (I70li-(l72l are satisfied, each U™(i) in the 
CL channel codebook is chosen uniformly and independently from T s m (U c ). Similarly, for each i, codewords U m (j'\i) to 
be transmitted over the channel are chosen uniformly and independently from T™(U\U C ). It then follows from Corollary Q] 
that ( TTUb and ( TTTb are sufficient for successful decoding of both Z r c l (i) and U™(i) simultaneously at both decoders. It 
also follows from standard arguments that d72l is sufficient for reliable transmission of additional information with rate 
i?£ r to the refinement receiver. ■ 
Scheme 2: The CL is encoded as in Scheme 1. The RL, however, is sent using dirty paper coding with the CL codeword 
as encoder CSI, and is decoded first. 

Theorem 4: Let ^(k) be the union of all {R c cc ,R c CT1 R c rr ) for which there exist U c G U c , U r € U r , and T 6 T with 

T-(U r , U c ) - (V r ,V c ) and (U r , U c )-U- (V r , V c ) such that 

Kc < kI{U c ; V c ) (73) 
R% < Kl(U c ;T,V r ) (74) 
RZr < K[I(T;V r ) - I(T;U C )] . (75) 

Then TZ^(k) C C W zbc(k)- 

Proof: Since RL is to be sent by separate source and channel codes, the channel coding part can proceed as in 
standard dirty -paper coding (cf. [6]), if d75l ) is satisfied. Note that as in Corollary |2j the auxiliary codeword T m can also 
be decoded in the process of decoding the RL. With high probability, this codeword is typical with the CL codeword 
U™ in addition to V^ m . Subsequently, for decoding the CL, the channel output at the r decoder can be taken to be a pair 
(y r m ,T m ). Therefore, as in Scheme 1, Z 7 c l (i) can be successfully decoded given that ( 1731 and d74l ) hold. ■ 
Scheme 3: The encoding is performed as in LDS, but the decoding order is reversed. Since RL is decoded first at the 
r receiver, the CL codeword purely acts as noise. But the r decoder then has access to the RL codeword. So for that 
receiver, the CSI is also available at the decoder. The following theorem makes use of these observations. 
Theorem 5: Let 7£§(k) be the union of all (R c cc , R^ r , i?^r) f° r which there exist U c € U c , U r £ U r , and T 6 T with 
T - {U r , U c ) - (V r ,V c ) and (U r , U c ) - U - (V r , V c ) such that 

Rlc < k[/(T; V c ) - I(T; U r )} (76) 
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R c cr < Kl(T;V r \U r ) (77) 
R c rr <Kl(U r ;V r ). (78) 

Then TZ^(k) C Cwzbc(«0- 

Proof: Since RL is both encoded and decoded first, (F78T > is necessary and sufficient for successful decoding of [7™. 
Once U™ is decoded, the channel between CL and receiver r reduces to one with input U"\ output (V™, U™), and CSI 
U™. It then follows from Theorem [TJ that ( 176*1 ) and ( 1771 ) suffices for reliable transmission of Z™. Note that the right-hand 
side of (77} is equivalent to 7(T; C/ r , V r ) - I(T; U r ). ■ 
We now present partial analytical results comparing performances of all the layered schemes. 
Lemma 5: It is always true that 71%(k) C W[ («). Thus Scheme 1 is superior to Scheme 2. 

Proof: It suffices to prove the lemma for k = 1. Let (i?co ^cn ^rr) ^ ^(l)- Then there must exist f/i 1 ^, f/r^j T, and 
f7 with T - (UP,UP) - (V c , V r ) and (C/ c (1) , C/r 1} ) - f7 - (V c , K-) so that <|73)-(Z5) are satisfied. Now define uP = U and 
let 

R$ = I(U^;V C ) 
RV) = I(U^;V r ) 
Rg) = I(U;V r \uP) 

and 

Rg = I(Uj?hVc) = mv c ) 
RV) = I(UP;V r ) = I(U;V r ) 
R r 2 J = I(U;V r \U^) = 0. 

By definition, both (Rg), r£), R$) and (R^) , R$) , R$ ) belong to 7^(1). So does any convex combination of the two 
triplets. That is because if we define Q ~ Ber(A), so that 

p(q, u c , u, v c , v r ) = p(u, v c , v r )p(q)p(u c \u, q) 

we can then write any convex combination as 

RW = I(UP;V C \Q) = I(UP,Q;V C ) 
Rg) = I(uP;V r \Q) = I(UP,Q;V r ) 
Rg> = I(U;V r \uP,Q) . 

Defining U [ C X) = (C/ C (Q) ,Q), one can see that (R& , R&\ r£)) G 7^(1). 
It is clear that 

R$ <I(UP;T,V r ) ■ (79) 
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It also follows from the Markov chain (t/ c (1) , T) — U—V r that 

I{UP,T;V r ) <I(U;V r ) . (80) 

A fact which is not as obvious is 

RW + RW>I(UP-,T,V r ). (81) 

Towards proving (IBTt . we observe using < T80b that 

I(U;V r ) > I(UP,T;V T ) 

= I(UP;V r \T) + I{T;V r ) 

= I(U^;T,V r )+I(T;V r )-I(T;U^) . (82) 

But since R$ + = I(U; V r ), this yields dSB directly. 
Next, we choose A so that 

Rg>=I(UP;T,V r ). 

That this can always be done follows from (|79l l and dSTJ together with the observation that R)$ + R$r = Rcr • We then 
simultaneously have 

Rg> > R c cc (83) 
Rg> > R c cr (84) 
Rg> > R c rr . (85) 

Here, GSJ follows from the fact that R c cc < I(uP;V c ) = < R& . The fact that R c cr < I(uP;T,V r ) = R& yields 
d84] i. Finally, (T85l > follows because 

Rg) = I(U; V r ) - Rg> 

= I(U;V r )-I(UP;T,V r ) 

> I(T;V r ) - I(T;UP) (86) 



where we used ( 1821 ) in showing 

It is also easy to show that under the regime where U — U c + U r where + is an appropriately defined addition operation 
with an inverse, i.e., U r — U — U c , and U c and U r are independent, Scheme 1 becomes a special case of LDS. Thus, for both 
the quadratic Gaussian and the binary Hamming cases, LDS performs at least as well as Scheme 1. To prove this claim, it 
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suffices to pick T = U c in LDS, which achieves the performance 

R c cc = k[I(T; V c ) — I(T; U r )] 

= kI(U c ;V c ) 
R c cr = K[I(T;V r ) - I(T;U r )} 

= Kl(U c] V r ) 

R c rr = Kl(U r ;T,V r ) 

= K[I(U r ;U c )+I(U r ;V r \U c )} 
= Kl(U r + U c ;V r \U c ) 
= Kl(U;V r \U c ) 

making Scheme 1 is a special case of LDS. 

We can also compare the performances of Scheme 3 and LDS for the quadratic Gaussian case with k = 1. Using the same 
random variables as in LDS, (|76|>-(|781 translate to the achievability of 

R C CC = J( 7 tT r + U c ; U c + U r + W c ) - IfrUr + U c ; U r ) 
1 1 + -^ 

lo S / ; »_ 7)3 x (87) 



1 + DP 



\i/P ' \ 



R c cr = I(jU r + U c ;U c + U r + W r \U r ) 
= I(U c ;U c + W r ) 

= ^° g ( 1 + €) (88) 

R c rr = I(U r ;Uc + Ur + Wr) 

= + (89) 

where d87| i follows from ( f33l >. Since the choice of 7 affects only R c cc , it can be picked so as to maximize R c cc . In fact, this 
choice coincides with Costa's optimal 7 for the point-to-point channel between U c and V c , where the CSI U r is available at 
the encoder [3]. In other words, the optimal choice is given by (cf. [3, Equation (7)]) 

vP 

7 ~~ vP + W c 

yielding 

Kc = 1 log (l + £r) ■ (90) 



2 V W f 



Also note that R c cr + R° r = ± log + thereby keeping g7} and (|48j» valid. That is, 



N r vP + W r 

D r = — ■ „ , „/ ■ (92) 



1 + N r 



1 1 

D c N r 



P + W r 
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Solving for v in (l9TT l and substituting it in (|92l > yields 



N,.W r J C N C + ^(N C -J C ) 
r P + W r ' DcN c + N r (N c — D c ) ' 



for the entire range 



N C W C 

< D r < N r 



P + W c 

Lemma 6: For the quadratic Gaussian problem with k = 1, the performance of LDS is superior to that of Scheme 3. 
Proof: 

Let us first compare (|93]l to ((5]} for the W c > W r case. We shall show for all < D c < N c that 

N r N 2 c W r D c N r W r D C N C + ^^(N C -D C ) 



D C N C + N r (N c - D c ) (W r - W C )N C + (P + W C ),D C " P + W r P, C N C + N r (N c - D c ) 
or equivalently that 

D C N C (P + W r ) < (d c + ^(N c - D c ) \ ((W r - W C )N C + (P + W C )P C ) . (94) 
Adding D C N C (W C - W. r ) to both sides of (|94]l yields 

P C N C (P + W C ) < (D C + ^(N C -D C ) \ (P + W c )P c + ^(N c -P c )(W r - W c )N c . (95) 



Taking the first term on the right-hand side of ( 1951 1 to the left-hand side, we obtain 

/ W \ W 

D C {P + W C )(N C - D c ) 1 - — c - < ^(N c - P c )(W r - W C )N C 



or equivalently 

D C (P + W C )> W C N C 

which is guaranteed. Equality is satisfied in only three trivial cases: (i) When D c = Df z (C c ), which coincides with CDS, 
(ii) when W c = W. r , and (iii) when D c = N c , which should be excluded if _D™ ax < N c . 
As for the W c < W r case, to prove that LDS is superior, we need to show 

N r N 2 c W c . N r W r P c N c + ^-(N c -P c ) 



D C N C + N r (N c - D c ) P + W c ~ P + W r D C N C + N r (N c - D c ) 
or equivalently that 

N C W C £> e W r + W e (N e -D e ) 

< n , „, ■ (96) 



P + W c - P + W. 

Rearranging d96b . we have 

N C W C (P + W r ) < (P + W c ) (P c (W r - W c ) + W, 
which is once again equivalent to 

P C (P + W C ) > W C N C . 
Equality in this case is satisfied if and only if D c — D^ Z (C C ). 
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To summarize, for the quadratic Gaussian case with k = 1, LDS is provably the best. In the binary Hamming case, however, 
LDS is better than both Scheme 1 and Scheme 2, but an analytical comparison with Scheme 3 eluded us. Nevertheless, with 
an extensive set of numerical evaluations, we did not encounter a single case that Scheme 3 was better than LDS for the binary 
Hamming case with k = 1. 

B. Proof of Lemma [2] 

It follows from d47l i and (|48| ) that by varying v and 7, we obtain the tradeoff 

c - c P + W c (y ' 

N r 1 

D r = r 1 - — — - (98) 



1 + N, 

where 

W 



1 1 



1 , murfi 
1 w,. 



<v,i) = ^^7 2 + (l-7) 2 
6(1/, 7) = -(^7 2 + (l-7) 2 
We next fix D c , which, in turn, fixes a(u, 7) as 

, . DJP + WJ - W C N C 

7) - ^ — (99) 

and minimize D r , which reduces to maximizing b(v, 7). Since neither R c cc nor R c cr can be negative, we need both a(v : 7) < 1 

and b{v, 7) < 1 to be satisfied. The former requirement is guaranteed because we naturally limit ourselves to D c < N c . The 

latter, on the other hand, becomes vacuous since rewriting (l46*T l gives 

. , s < N c [Pa(^ 7) + Wj - N,. W,. [1 - ffl („, 7)] 

7) - N c [Pa(,,7)+W c]+ PN r [l- a (,,7)] (100) 

whose right-hand side is always less than or equal to 1. 

Now if W c > W r , we always have a{y, 7) > b(v, 7) since 

- 2 

6(".7) = o(",7)-^[Wc-W r ] . 



Thus, among all choices of 7 and v which satisfy ( I991 , the one that potentially minimizes D r is 7 = and 



That is because with this choice we have 6(^,7) = a(v, 7). It then remains to check (1 1001 ). which can be written after some 
algebra as 

N c N r [W c - W r ] > D C [P + W c ][N r - N c ] . 



This is granted if N r < N c and is equivalent to 



N c N. r [W c -W r ] 

[P + W c ][N r -N c ] UUi) 
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if N r > N c . The constraint ( 1101b . on the other hand, is in effect only if 

N C (P + W C ) <N r (P + W r ) 

for otherwise, it is trivially satisfied because D c < N c . Substituting 6(^,7) = a(y, 7) in (l43b yields 

D _ N r W r N2£) c 



^ C N C + N r (N c - £>«.)) ((W r - W C )N C + (P + W C ),D C ) 
On the other hand, if W c < W r , it is more helpful to write 



K v > 7) = w^"^' 7) - K 1 - 7) 2 



wT 



as this reveals 6(^,7) < ^^0(^,7)- Thus, the optimal choice of parameters is potentially 7 = 1 and 



W C N C 

v = 



D C [P + W C ] 

provided this choice satisfies (1100b . Once again, after some algebra, that translates to 

D < N e P[W c N c - W r N r ] + W C W,,N C [N C - N r ] 



[P + W c ] [N c - N r ] W r 



Substituting 6(^,7) = ^^0(^,7) in (|43T > yields 



D C N C + N r (N c - £ C )J (P + W f 
Combining all the above results yields ( BTT i and (l52l i. 

C. Proof of Lemma \3\ 

The Gaussian broadcast channel capacity is achieved by Gaussian Ub and U —Ub with Ub -L U — Ub (cf. [4]). Let < v < 1 
and v = 1 — v control the power allocation between Ub and U — Ub- The source rate-distortion function is similarly achieved 
by the test channel X — Zk + Sk with Zk -L Sk for k — b,g. For these choices, ( fT2] >. ([TJb . <TT~5T > and ([Tol l can be combined to 
give the following characterization of achievable distortions for general n: 



+ 

N 2 N / z/P \ K ( vP\ K 

D, [N g N t + V(W. - N.)l * (' + ^) I 1 + "-li-li («D) 

The key to the proof is the observation that for optimal performance, d 1 02b needs to be satisfied with equality for any k. To 
see this, assume that (Db, D g ) with Db < N;, satisfies ( 11021 with strict inequality for some < v < 1. Then one can decrease 
v until equality is obtained in ( 1102b . and still satisfy d 1 03b or ( 1104b . depending on whether X — Y g — Yb or X — Yb — Y g , 
respectively. That, in turn, follows because the right-hand side of either of ( 1103b or ( 1104b are decreasing in v. Thus, if ( 1102b 
is not tight, one can keep Db the same while decreasing D g . 
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When k = 1, equality in (1102t translates to 



For the case X — Y„ — Y b , ( 1103b then becomes 



N ff NgW fl £> 6 



%N b + N g (N b - D b fj ((W g - W h )N h + (P + W 6 )£> 6 ) ' 
If X - Y b - Y g , on the other hand, (11041 ) implies 



N s W g i} b 



and 



D g > 



(W g - W b )N b + (P + W b )D b 
N b N 9 (N 3 W 9 - (W g - W b )N & - (P + W 6 )A 



(W 9 - W fc )N b + (P + W 6 )£> 6 J (N g - N 6 ) 
simultaneously, which is the desired result. 

D. Proof of Lemma 

For the binary symmetric channel, C(n) is achieved by U b ~ Ber(i) and U = U b ®U g with U g ~ Ber(#) and J7 g independent 
of [/{,. The parameter 9 serves as a tradeoff between R b and R g . The conditions (fT2l > and (fT3l then become (cf. [4]) 

P b < K [l-H 2 (6* Pb )] (105) 
P 9 < K[ff 2 (0*P ff )-H 2 (p fl )] . (106) 

For the source coding part, we evaluate lZ*(D b , D g ) only with the auxiliary random variables chosen as in Section [VI-BI 
where subscripts c and r are to be replaced by r and c or by c and r. 

These simple choices may potentially result in degradation of the separate coding performance, as the bounds on the alphabet 
sizes for Z b and Z g in [15], [17], [18] are much larger. However, our limited choice of (Z b , Z g ) can be justified in two ways: 
(i) to the best of our knowledge, there is no other choice known to achieve better rates, and (ii) to be fair, we use the same 
choice in our joint source-channel coding schemes. 

As in the quadratic Gaussian case, we can write 

I{X;Z k \Y k ,) = q k r(a k ,P k ,) (107) 

for k, k' e {b, g}. Combining (Q3]), dl05> , and (1107t yields (|66]l. Similarly, combining (O, (1106l l, and dl07| ), we obtain 
when X -Y g - Y b , and (gl when X - Y b - Y g . 
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